Modularity "for free" in genome architecture? 
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Background Recent models of genome-proteome evolution have shown that some of the 
key traits displayed by the global structure of cellular networks might be a natural result of 
a duplication-diversification (DD) process. One of the consequences of such evolution is the 
emergence of a small world architecture together with a scale-free distribution of interactions. 
Here we show that the domain of parameter space were such structure emerges is related to 
a phase transition phenomenon. At this transition point, modular architecture spontaneously 
emerges as a byproduct of the DD process. 

Results Although the DD models lack any functionality and are thus free from meeting 
functional constraints, they show the observed features displayed by the real proteomc maps 
when tuned close to a sharp transition point separating a highly connected graph from a discon- 
nected system. Close to such boundary, the maps are shown to display scale-free hierarchical 
organization, behave as small worlds and exhibit modularity. 
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Conclusions It is conjectured that natural selection tuned the average connectivity in such a 
way that the network reaches a sparse graph of connections. One consequence of such scenario is 
that the scaling laws and the essential ingredients for building a modular net emerge for free close 
to such transition. 
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I. INTRODUCTION 

The intimate structure of ceUular life is largely associ- 
ated to the networks of interactions among different types 
of molecules. The structure of cellular networks, from the 
genome and the proteome to the metabolome strongly 
departs from a simple random graph (3(f). Instead, these 
nets display a highly heterogeneous architecture: most 
units (genes, proteins or metabolites) are linked to a few 
other units but invariably a few units exhibit a large num- 
ber of links. Such heterogeneity has been also found in a 
wide spectrum of complex systems, from natural to arti- 
ficial (8). More importantly, the topological organization 
of complex nets might pervade their efficiency, their ro- 
bustness and their fragility under perturbations Q). 

The analysis of network structure and dynamics of- 
fers a new window to answer questions relating evolu- 
tion of biocomplexity (!2(f). Networks are organized in 
highly non-random ways and the topological organization 
of their connectivity allow to quantitatively define some 
characteristic traits. Understanding the origins of such 
properties requires an understanding of the evolutionary 
mechanisms that generate these networks. Since prop- 
erly defined quantitative traits can be measured, models 
are strongly constrained to reproduce a well-defined set 
of features. 

From a statistical point of view, protein-protein or 
gene-gene interaction maps can be viewed as a random 
network (0; in which the vertices represent the pro- 
teins (genes) and an edge between two vertices indicates 




FIG. 1 Two examples of modular networks. In (a) three 
basic modules are involved, each one involving a set of nodes 
randomly connected among them with some given probability. 
Each node can also be connected (with a smaller probability) 
with other nodes in other modules. In (b) a hierarchical net- 
work is shown, created by repeating a given basic motif at 
different levels JI2I). 



the presence of an interaction between the respective pro- 
teins. In this paper we restrict our analysis to an undi- 
rected graph of protein-protein interactions, but some of 
our conclusions can be translated to regulatory networks. 

Mathematically, the proteome graph is defined by a 
pair ftp = {Wp,Ep), where Wp = {pi},{i = l,.-.,N) 
is the set of N proteins and Ep = {{pi,Pj}} is the set 
of edges/connections between proteins. The adjacency 
matrix indicates that an interaction exists between 
proteins Pi,Pj G fip {^ij = 1) or that the interaction is 
absent (^^ = 0). Two connected proteins are thus called 
adjacent and the degree of a given protein is the number 
of edges that connect it with other proteins. 
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The analysis of metabolic pathways, protein interac- 
tion maps, genetic regulatory networks and gene expres- 
sion data reveals that cellular webs belong to a class 
of network topologies known as scale-free (SF) networks 
l|T3 ll^ . A SF net is characterized by a so-called degree 
distribution P(fc) displaying power-law behavior. Here 
P(fc) is the probability of finding a unit which is linked 
to k other units and typically decays as P{k) ~ k~'* with 
2 < 7 < 3. Here links correspond, for protein maps, 
to protein-protein interactions. These networks are also 
small worlds: the average number of steps d required in 
order to jump from one protein to another through the 
network is very small 1|12|) . 

Scale-free graphs have been shown to emerge from dif- 
ferent types of mechanisms |3 El '2?) ■ Most of them 
involve (explicitly or implicitly) a multiplicative process 
known as preferential attachment (0). In its standard 
form, it relies on a popularity principle (rich gets richer): 
as new node are added to the system, they tend to at- 
tach preferentially to nodes with higher degree, in the 
Barabasi-Albert (BA) model, this process leads to a SF 
distribution with P(fc) ~ k~^ (Q). 

An additional feature is the presence of modular archi- 
tecture with well-defined hierarchical properties |2^ . An 
example of modular network is shown in figure^. Here 
three sets of nodes appear more connected among them 
than with other nodes in the graph. Three modules are 
thus naturally defined (although only from a topological 
point of view). In this particular model (defined in f24)) 
nodes inside each module are randomly wired with some 
probability p, as in so-called Erdos-Renyi (ER) graphs. 
They are also linked to nodes in other modules with a 
probability q < p. Such networks exhibit a Poissonian 
degree distribution. Cellular networks, however, are not 
poissonian, but are certainly modular, exhibiting hierar- 
chical organization l|24l) . 

In table I we summarize the differences between net- 
works generated through random wiring (ER), prefer- 
ential attachment (from the BA model) and the actual 
proteome map. It is interesting that none of the mod- 
els gives modular architecture 1,24^) . Protein modules, 
for example, result from the binding of multiple protein 
molecules forming stable complexes. The presence of hi- 
erarchies has been shown to be measurable from the so 
called clustering coefficient Ci which measures the frac- 
tion of neighbors of this node that are neighbours among 
them, i. e. 



C, = 



2E,, 



hih - 1) 



(1) 



In this formula, Ei is the number of links between neigh- 
bours of i (with degree) ki, ki{ki — 1) being thus the 
total number of possible links between neighbours of i. 
The average of Ci, that is, C{N) = Ci/N, describes 
in general the clustering coefficient of a network. This 
measure has been observed to be much higher in real 
networks than for random graphs in a variety of fields 
ijl j) . and in particular, it has also been shown to display 



Property 


ER graph 


BA model 


proteome 


C{N) 


N-^ 




independent 


C{k) 


independent 






P{k) 


Poissonian 


k-'< 


(fc + /co)-^e-'=/'=<=) 


Modules 


no 


no 


yes 



TABLE I Global properties displayed by different types of 
graphs, compared with those exhibited by a hierarchical sys- 
tem, such as the proteome map. Here the scaling exponent is 
2 < 7 < 3. 



a scale-free distribution 

It is generally acknowledged that modules define func- 
tional units and as such are the target of selection 
(0; Is^ . In this context, some authors suggested that 
general "design principles ' -profoundly shaped by the con- 
straints of evolution- gov em the structure and function of 
modules" d (see also jsllil). 

Modules have been found in biological systems at 
multiple levels, from RNA structures (Q) to the cere- 
bral cortex (see l|2^ and references therein). This 
widespread character of modular organization pervades 
the functional association between compartmentalization 
and evolution. Modules have been variously defined as 
functionally buffered, robust, independently controlled, 
plastic in composition and interconnectivity and evolu- 
tionarily conserved. The evolutionary conservation of 
modules is clearly appreciated in gene networks involved 
in early development l(2^ IstI ? ) . The argument is that 
the special features of some of these modules are tightly 
linked to their robustness under different sources of noise. 

The modular character of biological networks is as- 
sumed to be a consequence of both their robustness 
and evolvability |44l[). In this context, modularity would 
evolve through a decrease of pleiotropy (45). Since they 
somewhat define separated compartments, they would 
act as buffers against lethal mutations perhaps facilitat- 
ing variation l)4lTl . In a different context, it has been 
suggested that modularity might arise from the intrin- 
sic structure of the non-metric mapping between geno- 
type and phenotype l)33t) . Although functionality must 
pervade the selection of some modular structures, here 
we show, by exploring available data and simple models 
of proteome evolution, that proto-modules might actu- 
ally result from a duplication-divergence process with- 
out any predefined functional meaning. If correct, this 
observation would actually indicate that modular struc- 
tures would be already in place as a byproduct of genome 
growth. 



II. RESULTS AND DISCUSSION 

A. Phase transition in the proteome evolution model 

Any model involving genome evolution must take into 
account the leading mechanism that appears to be re- 



3 



sponsible of its growth: gene duplication. Through gene 
duplication l)23() new elements are incorporated to the 
system, initially introducing an element of redundancy, 
since genes are duplicated and thus their connections 
with others too. Afterwards, divergence or loss of func- 
tion occurs and either new functions/interactions are de- 
veloped or pseudogenes (i. e. nonfunctional copies of 
duplicated genes) generated. 

In trying to understand genome evolution under a net- 
work perspective, two possible approaches can be fol- 
lowed. First, the network architecture is given and the 
dynamics of gene regulation and its stability can be 
explored by changing well-defined network parameters, 
such as average connectivity (fl^ I2(tI) . A different 
approach would consider the process itself of network 
growth. A simple model of this process can be con- 
structed by using a graph representation of the genome, 
where genes are the nodes and links are the edges. At 
each time step a duplication event takes place, and the 
number of genes in the system provides a natural time 
scale, although the rate of link rewiring i much faster than 
the rate of duplication (see below). Two independent 
studies, involving both analytic results and data analy- 
sis, presented simple models of proteome network evolu- 
tion through gene duplication and diversification. These 
models were able to explain a large part of the observed 
complexity of protein network architecture, particularly 
the presence of small world patterns and the scale-free 
behavior. Their results were compared with some of the 
statistical pattern with those observed from proteome 
maps (lU IM 113; 113 EH)- Two other studies presented 
closely related models using protein domains as the basic 
units (25; 46) again revealing that the complex patterns 
found in cellular interaction maps emerge from these mi- 
croscopic laws of genome evolution. Further work has 
confirmed these results [s^) confirm the basic predic- 
tions presented in those original papers. Further work 
in this area involves the exploration of the origins of the 
protein universe structure, again under simple models of 
duplication and diversification |Tot Is^ . Although pre- 
vious papers have explored some average traits of these 
interaction maps (such as their scale-free structure and 
the presence of small- world architecture) here we analyse 
the patterns of correlations emerging from them and in 
particular the presence or absence of modular organiza- 
tion. 

The time evolution can be described in terms of the 
number of links, i. e. we can write down a discrete 
equation for the link dynamics: 



and the previous dynamical equation for links is trans- 
formed into a differential equation for the average degree: 



L.n+i=L,, + T{{K,{n)},6,a) 



(2) 



or, using the approximation dLn/dn « Ln+i — L„, the 
continuous model: 



^-r({if,(n)},,5,a) 



Using the chain rule, we have 



dLn _ ^ ndKn 
dn 2 " 2 dn 



(3) 
(4) 



dKn 
dn 



Ti{K,{n)},S,a)~^Kr, 



(5) 



Here the functional form of r(a;) will depend on some 
given (perhaps time-dependent) parameters such as rate 
of removal S or creation a of links as well as of the internal 
state, as defined by the distribution of links at a given 
step, here indicated as {Ki{n)} (with i = 1, ...,n). 

Different functional forms might be chosen, including 
rates of change that depend on the degree of the node, 
as suggested by some studies. Although duplication rate 
would be expected to depend on the number of links too, 
this seems controversial f 6-: fl^ Il7j) . 

The simplest situation would involve pure duplication 
with no link removal or rewiring. This situation cor- 
responds to T {{Ki{n)},S,a) — Kn and thus we would 
have dKn/dn = 2Kn/n with a straightforward analytic 
solution: 



. n„ 



(6) 



where rtg and Kq are the initial number of links and aver- 
age degree, respectively. As a consequence, an explosive 
increase in the connectivity will be obtained. Since cel- 
lular networks are sparse, we conclude that links have to 
be deleted at a fast pace in order to reach a low, finite 
number of links per unit. 

The model analysed in l|23) is defined by the following 
rules. We start form a set mo of connected nodes, and 
each time step we perform the following operations 

(i) One node of the graph is selected at random and 
duplicated 

(ii) The links emanating from the newly generated 
node are removed with probability 5. 

(iii) New links (not previously present after the duplica- 
tion step) are created between the new node and all 
any other node with probability a. Although avail- 
able data indicate that new interactions are likely 
to be formed preferentially towards proteins with 
high degree here we do not consider this constraint. 

Step (i) implements gene duplication, in which both 
the original and the replicated proteins retain the same 
structural properties and, consequently, the same set of 
interactions. The rewiring steps (ii) and (iii) imple- 
ment the possible mutations of the replicated gene, which 
translate into the deletion and addition of interactions 
with different proteins, respectively. The process is re- 
peated until N proteins have been obtained. 

The model described in l)4l|) is very similar, but in- 
troduces some relevant differences. Here duplication (i) 
is also followed by two probabilistic rules which operate 
independently. The first (ii) is node deletion. For each of 
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FIG. 2 Rules of proteome growth in the four possible scenarios. First, (1) duplication occurs after randomly selecting a node 
(small arrow). Then (2) deletion of connections occurs with probability S. This event can be correlated (C) when the deleted 
links are connected to the newly generated node or uncorrelated (NC), when all links are considered for deletion. Finally 
(3) new connections arc generated with probability a, again in a correlated or uncorrelated way. The time scales at which 
different events occur are known to be very different: duplication takes place at a much slower rate, whereas rewiring is much 
faster. Additionally, the specific rates at which each event occur might involve preferential attachment to proteins of higher 
connectivities. All these variants can be included. 



the nodes pj linked to the two pi and its duplicate p^, we 
choose randomly one of the two links and remove 

it with probabiUty S. Additionally, a new interaction 
connecting the two proteins (the parent and the dupli- 
cated) is introduced with probability tt. The last rule 
will naturally increase the number of triangles in the sys- 
tem and thus provide a source of high clustering. The 
rewiring process seems to be more appropriately defined, 
since the removal of one of the alternative links allows to 
"conserve" the function that was somehow present before 
the duplication event. In Sole's model, the whole set of 
links of the duplicated gene are preserved and loss of con- 
nections affects only the new copy. By using Vazquez's 
approach, more flexibility is allowed and the interaction 
map is more likely to remain connected. As defined, it 
is important to note that duplicates will diverge only to 
some extent: if a duplicated gene with degree fcj is du- 
plicated, only Ski will be removed on average. To reach 
higher levels of divergence (as suggested in the real pro- 
teome) we need to remove links from the rest of the map 
(and not just from the duplicate). Such a refinement 
is well based and has been also considered (sec discus- 
sion) providing essentially the same results in relation 
with network architecture (Sole et al., in preparation). 

The two models collapse into a single mean field de- 
scription where the average connectivity follows the dy- 
namics: 

^^-{Kn + cPa (n, Kn) - 26Kn) (7) 

an n 

where (p = 2a{n — Kn) in Sole's model and (j) = 2a{n — 
Kn) = TT in Vazquez's model. Actually, in a previous pa- 
per (? ? ) we showed that in order to have convergence 



in the system towards a scale-free stationary distribution 
we need a very small rate of link addition (consistently 
with observations). If we assume that a ~ 0{l/n) then 
a single link is added on average each step and thus the 
two models arc identical in the low- addition limit: specif- 
ically, if the graph is sparse, we have a{n — Kn) ~ tt. In 
this case we have a dynamical equation 



dKn 26-1^, 2-n 
dn n n 

which has an associated general solution: 



Kn = e 



-dn + C 



(8) 



(9) 



where r/(n) = / (2^ — l)dn/n ■ 
This gives: 



(2^- l)lnn. 



Kn 



2n 



2(5-1 



Kn 



2tt 



25- 1 



-(25-1) 



(10) 



as > 5 c = 1/2, the previous system converges to a graph 
with a finite average degree 



Ko 



lim Kn 



27r 



25-1 



(11) 



Otherwise, the average connectivity will be K^o oo. 
The critical removal rate 5c = 1/2 thus defines a phase 
transition separating a phase with a highly-connected 
system (5 < Sc = 1/2) from a sparse phase {6 > 6c) where 
a finite number of links will be observed. At this phase, 
the network becomes fragmented into many pieces. It 
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FIG. 3 Phase transition in the genome growth models. Here 
N = 10^ and averages have been performed over R = 10 
replicas. Here the size of the largest component and the av- 
erage size are shown against the rate of link removal S. The 
predicted phase transition occurs at 5c ~ 0.5. Due to the 
finite (small) size of our networks, the transition appears to 
be less sharp than expected. 



is interesting to note that, under the present conditions, 
the long-term behavior of the average connectivity does 
not depend on the rate of hnk addition. What is re- 
ally important is that the rate of link addition and link 
removal are similar, so that (fc) can reach a stationary 
value. Moreover, it can be shown that although no ex- 
plicit preferential attachment is included here, the multi- 
plicative nature of the process (in which proteins having 
more links are more likely to have them copied) actually 
leads to an effective preferential attachment (39.). 

We can test this prediction by studying the behavior 
of the model under different rates of link deletion. In 
order to measure the impact of this rate on network's 
architecture, we use two different, but closely related 
measures: (1) the normalized largest component size S 
and (2) the average, normalized component size (s). If 
C{n) = {fii, £72, f^c} is the set of connected compo- 
nents (subgraphs) of the proteome map, so that 



(12) 



and rii = |f2i| indicates their size (so that J^i^i — 
we define: 



5" = — max{ni} 



(13) 



(14) 



In figure 3(a-b) we display the two measures against S for 
a = 10^ protein network. Close to Sc we can appreci- 
ate a clear change. The two phases are clearly identified, 
with the connected one showing 5 « 1, (s) « 1 and the 
fragmented phase showing S « l/N, (s) « 1/A^. In 3(a) 
we can see that S decreases slowly close to Sc, where 
only about half of the nodes remain connected within 
the largest component. The sharpness of the transition 
becomes much more obvious in 3(b). Here we clearly ap- 
preciate the impact of rewiring on network's structure, 
indicating that a large fraction of the overall network 
structure is formed by small, isolated components Jn fig- 
ure 4 we can see some examples of the graphs generated 
(largest components) obtained at different rates of dele- 
tion. 



III. HIERARCHICAL ORGANIZATION, MODULARITY 
AND CORRELATIONS 

Previous papers on genome/proteome architecture 
have mainly described the heterogeneous character of the 
protein-protein map as well as a few large-scale features 
as the clustering coefficient or the network's diameter. 
Beyond such measures, which only contain a limited part 
of network's structure, correlations offer a much better 
view of their internal organization. 

One measure of correlations can be easily obtained by 
looking at the set of conditional probabilities Pc{k\k') 
that a protein having k links is connected to a protein 
with k' links l|2]]) . If no correlations exist (as it would 
occur in a purely random network) then we would have 
Pc{k\k') — p{k). We can analyse the average connectivity 
{k{n)) defined as: 



(i^(n)) =^fc'pc(fc|fc') 



(15) 



(which is just {k{n)) = (fc) in the absence of correlations) . 
Data from PIN gives a scaling law {k{n)) ~ fc"" with 
ly « 0.30 ± 0.03, as shown in figure 5(a) (open triangles). 
The result from Sole's model close to the phase transition 
is also shown (black circles), with i^sm ~ 0.32±0.06. This 
scaling law indicates that there is strong anticorrelation 
among nodes with low and high degree. Hubs tend to 
be unconnected among them, and instead they are con- 
nected with low-degree proteins. This type of network 
is also known as disassortative. The scaling appears to 
behave the same way in both data and model, but the 
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FIG. 4 The architecture of the proteome map, as generated by the simple model for different values of the deletion rate S. As 
predicted by the mathematical model, two well-defined phases are present. For the first, when S < 5c ~ 0.5, the protein map 
is highly connected and most elements have links to others. Conversely, for 5 > Sc the graph is fragmented into many pieces 
and many components have no links or belong to small isolated subnets. Close to the transition domain, we have a sparse 
graph with the statistical features displayed by the real proteome map. Such graph displays modular organization, in spite of 
a complete lack of functionality in the definition of the model rules. 



higher average connectivity predicted by the model ac- 
tually shifts the in silico law towards higher values. This 
difference is easily removed when the model is expanded 
allowing to remove links in a correlated way not restricted 
to the recently duplicated node. 

Similarly, the presence of hierarchical organization can 
be highlighted by looking at the clustering-degree func- 
tion C{k). As discussed in the introduction, this function 
provides a statistical test for the presence of hierarchies 
in graph structure. As we can see in figure 5(b), both 
the proteome map and its in silico counterpart display a 
non-uniform behaviour of the clustering against degree. 
This gives further support to the presence of modular 
structure (see below). 

A more detailed, complete view of the correlation 
structure of both model and real maps is given by corre- 
lation profiles (CP) as defined in l|2]|l . In order to com- 
pute it, we calculate the joint probability P{ki, kj) with 
1 < ki,kj < N, that two proteins are connected to each 
other. We also compute the probability Pr{ki,kj) ob- 
tained by randomizing the same network (i. e. a null 
model with no correlations). Significant correlations will 
be observed through systematic deviations of the ration 

from the null model (i. e. deviations from T(ki, kj) = 1). 
In figure 6 the results from the CP are shown for both 



real yeast proteome (a) and different models (b-d) . 

Two prominent features are observed in 6(Y). The first, 
consistently with the previous analysis of (fc(n)), is the 
presence of anticorrelation between nodes of given de- 
gree. This is indicated by the red spots: nodes with 
high degree are not connected among them, but typi- 
cally linked to proteins with low degree. A second feature 
is the presence of significant correlation among proteins 
with degrees close to ki ^ 10. Actually, a wider domain 
close to the diagonal is implicated, indicating the pres- 
ence of sets of proteins forming multiprotein complexes 
Q. Both DD models (figures 6(A,B), here (A) Sole's 
model and (B) Vazquez's model) naturally give the red 
spots at the correct locations in the CP. Additional cor- 
relations are shown near the {ki,kk) ^ (10, 10) zone. In 
(B) two spots are observed around this location, whereas 
in (C) the correlation is present close to the diagonal but 
although less pronounced. The first feature is a result of 
the intrinsic dynamics shown by the DD models, in which 
rapid divergence after duplication allows initially linked 
hubs to become disconnected. The second feature pro- 
vides a good example of how truly functional constrains 
(those defined by protein complexes) shape real genome 
architecture. As discussed by Maslov and Sneppen, mul- 
tiprotein complexes are largely responsible for this fea- 
ture. The fact that the DD models do not display this 
structure is an indication that the lack of functionality is 
likely to explain the lack of the observed pattern. 
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FIG. 5 Comparison between correlations in the proteome 
model at 5 = 0.55 and the observations form the yeast pro- 
teome map (here the models used provided a connected com- 
ponents with the same size than the yeast map). In (a) the 
correlation scaling for the average connectivity is shown, with 
a fit for both yeast data (circles) and model (triangles). Al- 
though the scaling behavior is the same, the larger number 
of links predicted by the model shifts the expected average 
towards higher values. In (b) we plot the scaling behavior of 
the clustering against degree. The dashed line indicates the 
expected scaling behavior assuming hierarchical organization 
(see text). 



For comparison, we also display the correlation profile 
obtained from a different model of proteome evolution 
l|4^ . This is actually a particular example of a model 
presented by Dorogovtsev and Mendes, (fll') (DM) in 
which no change in the number of nodes is allowed, only 
rewiring. Here duplicated genes play no role and thus 
no correlations from duplication are preserved. Interac- 
tions are added and eliminated at given rates, being these 
rewiring rules applied using preferential attachment. Un- 
der a strict balance between addition and deletion (again, 
we have a phase transition between explosion and frag- 
mentation) a power law in the degree distribution is ob- 
tained. But any correlation is lost under this type of 
approach (such as the lack of clustering or modularity). 
This is illustrated in figure 6(d) where the correlation 
pro file obtained from the DM model parameters used in 
1)42 : is shown. A visual inspection reveals a proteome 
map with little relation with the observed one. This re- 
sults should prevent us of performing comparisons be- 
tween model and real network data limited to a single 
topological property. 

The previous correlations displayed by DD models and 
the evidence of a hierarchical organization strongly indi- 
cate that some type of modular architecture should be 
expected. In order to properly detect modules, we use 
the topological overlap method |2^. An overlap matrix 
OT{i,j) is defined as: 



im{k,, kj} 



(17) 




K K 

B W 

FIG. 6 Comparison between degree correlations in the pro- 
teome model at (5 = 0.55 and the observations form the yeast 
proteome map. (Y) Real yeast network, (A) Sole's model, (B) 
Vazquez's model and (W) Wagner's model. In (Y), one can 
observe the two red spots for high degree nodes that are linked 
to low degree ones, and some correlation at about k ~ 10. 
As it is apparent, (A) and (B) resemble the real case (Y), 
whereas (W) does not, highlighting the importance of the in- 
ternal structure besides the degree distribution. 



Here J„(i, j) is the number of proteins to which both pi 
and pj are linked. The denominator gives the smallest de- 
gree of the pair {ki,kj}. Since both terms are constrained 
to the interval (0, N), the overlap matrix is normalized, i. 
e. < OT{i,j) 1- This matrix can be then displayed 
as a two-dimensional plot with a color scale indicating 
the relative amount of overlap between two given nodes. 
The set of nodes is also arranged with an appropriate al- 
gorithm so that elements belonging to the same module 
appear close within the matrix. Two examples of these 
maps are shown in figure 7, for the two models explored 
here. We can clearly appreciate the presence of proto- 
modules, as shown by the clusters of closely connected 
elements. A hierarchy of such clusters, defining a set of 
nested modular structures, is observed. 



IV. DISCUSSION 

The emergence of modularity is one of the key prob- 
lems of evolutionary biology. Modules are common to 
both natural and artificial systems ( 15) and it is generally 
agreed that modularization allows a well-defined func- 
tional separation with enhanced robustness against com- 
ponent failure. One should expect to observe modules as 
slowly emerging from small subgraphs performing some 
functional role (such allowing bistability, or the creation 
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FIG. 7 Topological overlap matrix from the two DD mod- 
els considered here. (A) Sole's model; (B) Vazquez's model. 
The modular architecture of the interaction maps has been 
obtained close to the phase transition point (here 5 — 0.5) 



of stripes in the early embryo) and adding more compo- 
nents able to tune their performance and increase their 
adaptability and robustness. In this way, compartments 
performing specialized functions should be expected to 
emerge. This might have been the case of the evolution 
of ontogeny in neural circuitry: a process of parcella- 
tion would have been shaping neural structures through 
a mechanism involving segregation and isolation. It is 
actually interesting to note that such a parcellation pro- 
cess deals with two essential components: the presence 
of some redundancy in cell-cell (neural) interactions fol- 
lowed by loss of one or more inputs to a cell. In other 
words, we need first to have several neighboring neurons, 
likely to have been obtained from cell duplication of a 
common parental strain. The initial set of neurons will 
be more densely connected and afterwards, specialization 
will occur by loosing some links. This process strongly 
reminds us the one taking place in the proteome map, 
although some fundamental differences are also present. 

The proteome model provides a surprising counterex- 
ample of these intuition. Here local rules are able to 
shape some key features of global structure. Such as sce- 
nario seems to be rather general, and might have impli- 
cations for the origins of metabolic paths too (Lehmann, 



Ravasz and Wuchty, submitted paper). Instead of slowly 
creating modules from significantly rewiring sub-parts of 
the graph, modules appear to be present as a consequence 
of the DD process. As illustrated by the previous figure, 
proto-modules spontaneously emerge and are thus a pre- 
pattern. Such a pre-defined structure could then be used 
in order to perform cellular functions. It is interesting to 
compare these structures with those present in technol- 
ogy graphs ifl^lssf) . 

What can be learn in general from this example? On 
the one hand, this study provides an example of mod- 
ularity "for free": there is no need of natural selection 
fine-tuning the system in order to obtain a large amount 
of correlations. Close to the narrow domain of high dele- 
tion rates scale-free architecture emerges in a natural 
way. Such a conjecture agrees with the view of evolu- 
tion as constrained and to some extent shaped by emer- 
gent properties |l8L Isil) . But several relevant questions 
emerge. One deals with the rates of link addition and re- 
moval. Why are we observing these high rates leading to 
a sparse graph? Two main possibilities emerge. One has 
to do with the requirement of a sparse graph in order to 
avoid dynamic instabilities. Specifically, if the activity of 
the network is taken into account, positive and negative 
links between different parts of a regulatory network can 
trigger the emergence of chaotic dynamics fsj). Feed- 
back loops in particular are known to destabilize complex 
networks and a sparse graph would easily avoid them to 
break system's stability. By tuning the average degree, 
selection might have reached a stable, robust network 
with proto-modules embedded within its basic architec- 
ture. Another is that such proto-modules might have 
been the real target of selecting a sparse graph. Modules 
themselves isolate different parts of the system and thus 
a mechanism favoring their emergence (even as proto- 
structures) might have been successfully chosen. Further 
studies should consider these possibilities by exploring 
the internal organization of the protomodules, to be com- 
pared with the one observed in real maps. 
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